Report version: v1.1
Overview
This section provides an overview of the imported dataset. Dataset statistics, variable types, a missing data profile and potential alerts are shown below.
| Discrete variable | 23 |
| Continuous variable | 4 |
| All missing variable | 0 |
| exitus_dt has 90657 (90.7%) missing values |
|
| dose_3_brand_cd has 90194 (90.2%) missing values |
|
| dose_3_dt has 90230 (90.2%) missing values |
|
| fully_vaccinated_dt has 91780 (91.8%) missing values |
|
| The variable ‘person_id’ does not have all unique values | Number of duplicate values: 4999 |
|
Variables
This section provides more detailed information per variable in the imported dataset.
Class of the variable: character
More than 100 distinct values
More than 100 distinct values
Class of the variable: character
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Class of the variable: integer
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 90657 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
More than 100 distinct values
More than 100 distinct values
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: logical
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 14312 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 90230 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 91780 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Compliance with the Common Data Model specification
We check whether the imported dataset complies with the data model specification (https://docs.google.com/spreadsheets/d/1Eva2ucg_M0WaDkCaF7qfBxk2DwTlUac9gKuP3xck4rw/edit#gid=0).
To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.
| Validation rule | Name rule | Items | Passes | Fails | Percentage of fails | Number of NAs | Percentage of NAs | Error | Warning |
|---|---|---|---|---|---|---|---|---|---|
| is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) | V01 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(age_nm) | age_nm - 18 >= -1e-08 & age_nm - 115 <= 1e-08 | V02 | 100000 | 85378 |
|
14.62% | 0 |
|
|
|
| is.na(age_cd) | age_cd %vin% c(“0-18”, “18-25”, “25-35”, “35-45”, “45-55”, “55-65”, “65-75”, “75-85”, “85-95”, “95-105”, “105-115”) | V03 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(exitus_bl) | exitus_bl %vin% c(TRUE, FALSE) | V04 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(education_level_cd) | education_level_cd %vin% c(“Low”, “Middle”, “High”) | V05 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(income_category_cd) | income_category_cd %vin% c(“Low”, “Middle”, “High”) | V06 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(migration_background_cd) | migration_background_cd %vin% c(“NATIVE”, “EU”, “NON-EU”, “PAR”) | V07 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(household_type_cd) | household_type_cd %vin% c(“ALONE”, “COUPLE”, “COUPLE_CHILD”, “LONE”, “EXTENDED”, “OTHER”) | V08 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(hospi_due_to_covid_bl) | hospi_due_to_covid_bl %vin% c(TRUE, FALSE) | V09 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(test_positive_to_covid_nm) | test_positive_to_covid_nm - 0 >= -1e-08 & test_positive_to_covid_nm - 50 <= 1e-08 | V10 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(test_nm) | test_nm - 0 >= -1e-08 & test_nm - 50 <= 1e-08 | V11 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V12 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V13 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(dose_3_brand_cd) | dose_3_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V14 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 | V15 | 100000 | 100000 |
|
0% | 0 |
|
|
|
| (is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) | V16 | 100000 | 55008 |
|
44.99% | 0 |
|
|
|
| (is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) | V17 | 100000 | 94807 |
|
5.19% | 0 |
|
|
|
| is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt | V18 | 100000 | 99622 |
|
0.38% | 0 |
|
|
|
| (!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) | V19 | 100000 | 81383 |
|
13.86% | 4754 |
|
|
|
| is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V20 | 100000 | 95242 |
|
4.76% | 0 |
|
|
|
| is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V21 | 100000 | 87752 |
|
12.25% | 0 |
|
|
|
| is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V22 | 100000 | 97824 |
|
2.18% | 0 |
|
|
|
The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’